Introduction

This tutorial demonstrates how to integrate transcriptomics data with connectomics data to identify neurons expressing specific genes and analyze their connectivity patterns.

Scientific Background

Voltage-gated calcium channels (VGCCs) play crucial roles in neuronal signaling. The Ca-α1T gene encodes a low-voltage-activated (T-type) calcium channel subunit that is particularly important in visual processing circuits.

Key biological insights: - Ca-α1T (FlyBase: FBgn0264386) - T-type calcium channels enable low-threshold spiking and burst firing, inc. calcium spikes in hyperpolarised neurons. - Known to be expressed in the optic lobes, but not in which types originally. - See Ishida et al. (2025) for example in central complex

Tutorial goals: 1. Load transcriptomics data from visual system cell types 2. Identify cell types expressing Ca-α1T 3. Map those cell types to the FAFB connectome 4. Analyze their connectivity patterns 5. Visualise a network of Ca-α1T+ neurons and their strongest synaptic partners

Currently working with:

  • Dataset: fafb_783 (FAFB - Full Adult Fly Brain)
  • Transcriptomics: Visual system cell types (96h development)
  • Data location: gs://sjcabs_2025_data (Google Cloud Storage)

Load Transcriptomics Data

Read Expression Matrix

The expression matrix contains normalized gene expression values for visual system cell types at 96 hours of development.

# Construct path to transcriptomics data
expr_matrix_path <- file.path(data_path, "transcriptomics", "visual_system",
                               "expr_matrix_96h_V1.1.tsv")

# Read expression matrix
# First column is gene names, remaining columns are cell types
if (use_gcs) {
  # Download from GCS
  temp_file <- tempfile(fileext = ".tsv")
  system(paste("gsutil cp", expr_matrix_path, temp_file))
  expr_matrix <- read.delim(temp_file, row.names = 1, check.names = FALSE)
  unlink(temp_file)
} else {
  expr_matrix <- read.delim(expr_matrix_path, row.names = 1, check.names = FALSE)
}

cat("Expression matrix dimensions:\n")
## Expression matrix dimensions:
cat("  Genes:", nrow(expr_matrix), "\n")
##   Genes: 12037
cat("  Cell types:", ncol(expr_matrix), "\n")
##   Cell types: 61
cat("\nFirst few genes and cell types:\n")
## 
## First few genes and cell types:
expr_matrix[1:5, 1:5]
##                        C2    C3   Dm1  Dm10  Dm11
## 128up                0.16  0.14  0.52  0.24  0.20
## 14-3-3epsilon       22.63 22.94 14.94 22.29 25.17
## 14-3-3zeta          20.64 24.02 50.41 21.47 35.20
## 140up                0.09  0.15  0.13  0.20  0.15
## 18SrRNA-Psi:CR41602  0.61  0.56  1.22  0.70  0.45

Read Cell Type Key

The cell type key maps between different naming conventions across datasets: - Nern_type: maleCNS naming - Matsliah_type / Schlegel_type: FAFB/BANC naming

# Construct path to cell type key
cell_type_key_path <- file.path(data_path, "transcriptomics", "visual_system",
                                 "cell_types_V1.1a.tsv")

# Read cell type key
if (use_gcs) {
  temp_file <- tempfile(fileext = ".tsv")
  system(paste("gsutil cp", cell_type_key_path, temp_file))
  cell_type_key <- read.delim(temp_file, stringsAsFactors = FALSE)
  unlink(temp_file)
} else {
  cell_type_key <- read.delim(cell_type_key_path, stringsAsFactors = FALSE)
}

cat("Cell type key dimensions:", nrow(cell_type_key), "cell types\n")
## Cell type key dimensions: 62 cell types
cat("\nColumn names:\n")
## 
## Column names:
print(names(cell_type_key))
## [1] "class1"        "type2"         "subtype2"      "Nern_type"    
## [5] "Matsliah_type" "Schlegel_type"
cat("\nFirst few rows:\n")
## 
## First few rows:
head(cell_type_key, 10)
##    class1 type2 subtype2 Nern_type Matsliah_type Schlegel_type
## 1       N    C2       C2        C2            C2            C2
## 2       N    C3       C3        C3            C3            C3
## 3       N   Dm1      Dm1       Dm1           Dm1           Dm1
## 4       N   Dm2      Dm2       Dm2           Dm2           Dm2
## 5       N   Dm3     Dm3a      Dm3a          Dm3p           Dm3
## 6       N   Dm3     Dm3b      Dm3b          Dm3q           Dm3
## 7       N   Dm4      Dm4       Dm4           Dm4           Dm4
## 8       N   Dm8      Dm8 Dm8a,Dm8b     yDm8,pDm8           Dm8
## 9       N   Dm9      Dm9       Dm9           Dm9           Dm9
## 10      N  Dm10     Dm10      Dm10          Dm10          Dm10

Voltage-Gated Ion Channels Overview

Before focusing on Ca-α1T, let’s explore the expression patterns of voltage-gated ion channels (VGICs) across cell types to provide context.

Identify Voltage-Gated Ion Channel Genes

# Search for voltage-gated ion channel genes
# Look for genes containing typical VGIC patterns:
# - Calcium channels: Ca-alpha, Ca-beta, cacophony (cac)
# - Sodium channels: para, NaCP60E
# - Potassium channels: Shaker, Shaw, Shab, Shal, eag, erg, elk

vgic_patterns <- c(
  "^Ca-alpha",  # Calcium channel alpha subunits
  "^Ca-beta",   # Calcium channel beta subunits
  "^cac",       # Cacophony (voltage-gated Ca channel)
  "^para",      # Paralytic (voltage-gated Na channel)
  "^NaCP",      # Sodium channel proteins
  "^Sh$",       # Shaker (Kv1 family)
  "^Shaw",      # Shaw (Kv3 family)
  "^Shab",      # Shab (Kv2 family)
  "^Shal",      # Shal (Kv4 family)
  "^eag$",      # Ether-a-go-go (Kv10 family)
  "^erg$",      # Erg (Kv11 family)
  "^elk$"       # Elk (Kv12 family)
)

# Find matching genes in expression matrix
vgic_genes <- grep(paste(vgic_patterns, collapse = "|"),
                   rownames(expr_matrix),
                   value = TRUE,
                   ignore.case = FALSE)

cat("Found", length(vgic_genes), "voltage-gated ion channel genes:\n")
## Found 14 voltage-gated ion channel genes:
print(vgic_genes)
##  [1] "Ca-alpha1D" "Ca-alpha1T" "Ca-beta"    "NaCP60E"    "Sh"        
##  [6] "Shab"       "Shal"       "Shaw"       "Shawl"      "cac"       
## [11] "cact"       "cactin"     "eag"        "para"

Extract VGIC Expression Matrix

# Subset expression matrix to VGIC genes
vgic_expr <- expr_matrix[vgic_genes, , drop = FALSE]

cat("\nVGIC expression matrix:\n")
## 
## VGIC expression matrix:
cat("  Genes:", nrow(vgic_expr), "\n")
##   Genes: 14
cat("  Cell types:", ncol(vgic_expr), "\n")
##   Cell types: 61
cat("  Value range:", min(vgic_expr), "to", max(vgic_expr), "\n")
##   Value range: 0 to 134.8
# Show summary statistics per gene
vgic_stats <- data.frame(
  gene = rownames(vgic_expr),
  mean_expr = rowMeans(vgic_expr),
  max_expr = apply(vgic_expr, 1, max),
  cv = apply(vgic_expr, 1, sd) / rowMeans(vgic_expr)  # Coefficient of variation
) %>%
  arrange(desc(mean_expr))

cat("\nTop 10 most highly expressed VGICs:\n")
## 
## Top 10 most highly expressed VGICs:
print(head(vgic_stats, 10))
##                  gene mean_expr max_expr        cv
## Sh                 Sh 31.025410    84.60 0.3743809
## para             para 30.002459   134.80 0.6002796
## Ca-alpha1T Ca-alpha1T  9.244262    72.42 1.3780203
## NaCP60E       NaCP60E  6.684262    33.70 0.7172604
## cac               cac  6.504426    10.80 0.2341405
## Shal             Shal  2.964918     7.38 0.5290207
## eag               eag  2.679180    11.84 0.8033499
## Shab             Shab  2.651148    18.60 1.0590988
## Ca-alpha1D Ca-alpha1D  2.404590     4.87 0.2838044
## Ca-beta       Ca-beta  1.682295     7.81 0.5991748

Heatmap of VGIC Expression

# Prepare data for heatmap
# Log-transform for better visualisation (add pseudocount to avoid log(0))
vgic_expr_log <- log2(vgic_expr + 1)

# Create heatmap using pheatmap
library(pheatmap)
library(viridis)

# Annotate rows by channel type
gene_annotation <- data.frame(
  Channel_Type = case_when(
    grepl("^Ca-", rownames(vgic_expr_log)) ~ "Calcium",
    grepl("^cac", rownames(vgic_expr_log)) ~ "Calcium",
    grepl("^para|^NaCP", rownames(vgic_expr_log)) ~ "Sodium",
    grepl("^Sh|^Shaw|^Shab|^Shal|^eag|^erg|^elk", rownames(vgic_expr_log)) ~ "Potassium",
    TRUE ~ "Other"
  ),
  row.names = rownames(vgic_expr_log)
)

# Define colors for channel types
channel_colors <- list(
  Channel_Type = c("Calcium" = "#E41A1C", "Sodium" = "#377EB8",
                   "Potassium" = "#4DAF4A", "Other" = "grey70")
)

# Create heatmap with viridis color scale
heatmap_plot <- pheatmap(
  vgic_expr_log,
  color = viridis(100),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  clustering_distance_rows = "correlation",
  clustering_distance_cols = "correlation",
  annotation_row = gene_annotation,
  annotation_colors = channel_colors,
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 8,
  fontsize_col = 8,
  main = "Voltage-Gated Ion Channel Expression Across Visual Cell Types",
  angle_col = 90,
  cellwidth = 15,
  cellheight = 12,
  border_color = NA
)

# Save high-resolution version
png(file.path(img_dir, "vgic_heatmap.png"),
    width = 18, height = 10, units = "in", res = 300)
pheatmap(
  vgic_expr_log,
  color = viridis(100),
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  clustering_distance_rows = "correlation",
  clustering_distance_cols = "correlation",
  annotation_row = gene_annotation,
  annotation_colors = channel_colors,
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 8,
  fontsize_col = 8,
  main = "Voltage-Gated Ion Channel Expression Across Visual Cell Types",
  angle_col = 90,
  cellwidth = 15,
  cellheight = 12,
  border_color = NA
)

dev.off()
## quartz_off_screen 
##                 3
cat("\nHeatmap saved to:", file.path(img_dir, "vgic_heatmap.png"), "\n")
## 
## Heatmap saved to: images/tutorial_05/vgic_heatmap.png

Key Observations

The heatmap reveals: - Cell-type specific expression patterns: Different VGICs show distinct expression profiles - Calcium channel diversity: Multiple Ca-α subtypes with varying expression - Potassium channel complexity: Diverse K+ channels enable varied firing properties - Co-expression patterns: Clustered genes may indicate functional relationships

Now let’s focus on Ca-α1T, a low-voltage-activated T-type calcium channel particularly important in visual processing.


Identify Ca-α1T Expressing Cell Types

Extract Ca-α1T Expression

Let’s find which cell types express the voltage-gated calcium channel Ca-α1T.

# Check if Ca-alpha1T is in the expression matrix
if (!"Ca-alpha1T" %in% rownames(expr_matrix)) {
  stop("Ca-alpha1T not found in expression matrix!")
}

# Extract Ca-alpha1T expression across all cell types
ca_alpha1t_expr <- expr_matrix["Ca-alpha1T", , drop = FALSE]
ca_alpha1t_expr <- as.data.frame(t(ca_alpha1t_expr))
colnames(ca_alpha1t_expr) <- "expression"
ca_alpha1t_expr$cell_type <- rownames(ca_alpha1t_expr)

# Sort by expression level
ca_alpha1t_expr <- ca_alpha1t_expr %>%
  arrange(desc(expression))

cat("Ca-α1T expression statistics:\n")
## Ca-α1T expression statistics:
cat("  Mean expression:", mean(ca_alpha1t_expr$expression), "\n")
##   Mean expression: 9.244262
cat("  Median expression:", median(ca_alpha1t_expr$expression), "\n")
##   Median expression: 6.38
cat("  Max expression:", max(ca_alpha1t_expr$expression), "\n")
##   Max expression: 72.42
cat("  90th percentile:", quantile(ca_alpha1t_expr$expression, 0.9), "\n")
##   90th percentile: 19.19
# Show top expressors
cat("\nTop 10 Ca-α1T expressing cell types:\n")
## 
## Top 10 Ca-α1T expressing cell types:
print(ca_alpha1t_expr[1:10, ])
##       expression cell_type
## Dm9        72.42       Dm9
## Pm4        43.87       Pm4
## Dm3a       43.36      Dm3a
## Dm3b       40.81      Dm3b
## Tm2        26.90       Tm2
## Mi1        21.69       Mi1
## C2         19.19        C2
## L5         17.16        L5
## LLPC3      14.70     LLPC3
## LPLC1      13.38     LPLC1

Visualise Ca-α1T Expression

# Define expression threshold (90th percentile)
expr_threshold <- quantile(ca_alpha1t_expr$expression, 0.9)

# Add category for plotting
ca_alpha1t_expr <- ca_alpha1t_expr %>%
  mutate(expressing = expression >= expr_threshold)

# Create bar plot
p_expr <- ggplot(ca_alpha1t_expr, aes(x = reorder(cell_type, expression),
                                       y = expression,
                                       fill = expressing)) +
  geom_col() +
  geom_hline(yintercept = expr_threshold, linetype = "dashed", color = "red", size = 0.8) +
  scale_fill_manual(values = c("FALSE" = "grey70", "TRUE" = "steelblue"),
                    labels = c("FALSE" = "Below threshold", "TRUE" = "High expression"),
                    name = "Expression Level") +
  labs(
    title = "Ca-α1T Expression Across Visual System Cell Types",
    subtitle = paste("Threshold at 90th percentile (", round(expr_threshold, 2), ")", sep = ""),
    x = "Cell Type",
    y = "Expression Level"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 8),
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

save_plot(p_expr, "ca_alpha1t_expression", width = 16, height = 8)
ggplotly(p_expr)
# Identify high expressors
high_expressors <- ca_alpha1t_expr %>%
  filter(expressing) %>%
  pull(cell_type)

cat("\nCell types with high Ca-α1T expression (≥90th percentile):\n")
## 
## Cell types with high Ca-α1T expression (≥90th percentile):
print(high_expressors)
## [1] "Dm9"  "Pm4"  "Dm3a" "Dm3b" "Tm2"  "Mi1"  "C2"

Map to FAFB Connectome

Load FAFB Metadata

# Construct path to FAFB metadata
meta_path <- construct_path(data_path, dataset, "meta")

# Read metadata
meta_full <- read_feather_smart(meta_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/fafb/fafb_783_meta.feather 
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 8.82 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 139215 rows
cat("FAFB metadata loaded:\n")
## FAFB metadata loaded:
cat("  Total neurons:", nrow(meta_full), "\n")
##   Total neurons: 139215
cat("  Columns:", ncol(meta_full), "\n")
##   Columns: 17
# Display first few rows
head(meta_full %>% select(all_of(dataset_id), cell_type, super_class, neurotransmitter_predicted))
## # A tibble: 6 × 4
##   fafb_783_id        cell_type super_class             neurotransmitter_predic…¹
##   <chr>              <chr>     <chr>                   <chr>                    
## 1 720575940630915748 ORN_DM6   sensory                 acetylcholine            
## 2 720575940620993718 ORN_VA1v  sensory                 acetylcholine            
## 3 720575940603453286 M_adPNm5  central_brain_intrinsic acetylcholine            
## 4 720575940630003472 M_adPNm4  central_brain_intrinsic acetylcholine            
## 5 720575940614316731 M_adPNm5  central_brain_intrinsic acetylcholine            
## 6 720575940629587671 M_adPNm5  central_brain_intrinsic acetylcholine            
## # ℹ abbreviated name: ¹​neurotransmitter_predicted

Map Transcriptomic Cell Types to FAFB

The cell type key uses Matsliah_type and Schlegel_type columns for FAFB/BANC naming.

# For FAFB, we'll use Schlegel_type (primary) and Matsliah_type (fallback)
# These map to the cell_type column in FAFB metadata

# Get high expressing cell types from transcriptomics
high_expr_celltypes_transcr <- high_expressors

# Map to FAFB naming using both Schlegel_type and Matsliah_type
fafb_cell_types <- cell_type_key %>%
  filter(subtype2 %in% high_expr_celltypes_transcr) %>%
  select(Schlegel_type, Matsliah_type) %>%
  pivot_longer(cols = everything(), values_to = "fafb_name") %>%
  pull(fafb_name) %>%
  unique()

cat("Mapped", length(fafb_cell_types), "cell type names to FAFB\n")
## Mapped 9 cell type names to FAFB
cat("\nFAFB cell type names:\n")
## 
## FAFB cell type names:
print(fafb_cell_types)
## [1] "C2"   "Dm3"  "Dm3p" "Dm3q" "Dm9"  "Mi1"  "Pm4"  "Pm05" "Tm2"
# Find neurons of these types in FAFB metadata
ca_alpha1t_neurons <- meta_full %>%
  filter(cell_type %in% fafb_cell_types)

cat("\nFound", nrow(ca_alpha1t_neurons), "Ca-α1T+ neurons in FAFB\n")
## 
## Found 6815 Ca-α1T+ neurons in FAFB
# Show cell type distribution
if (nrow(ca_alpha1t_neurons) > 0) {
  cat("\nCell type distribution:\n")
  print(table(ca_alpha1t_neurons$cell_type))

  # Get neuron IDs for connectivity analysis
  ca_alpha1t_ids <- ca_alpha1t_neurons[[dataset_id]]
  cat("\nNumber of Ca-α1T+ neuron IDs:", length(ca_alpha1t_ids), "\n")
} else {
  cat("\nWarning: No matching neurons found in FAFB metadata!\n")
  cat("This may be due to naming differences or sparse annotation.\n")
  cat("Consider using a broader search or different cell type mappings.\n")
}
## 
## Cell type distribution:
## 
##   C2 Dm3p Dm3q  Dm9  Mi1 Pm05  Tm2 
## 1424 1005  902  203 1582  148 1551 
## 
## Number of Ca-α1T+ neuron IDs: 6815

Connectivity Analysis

Load FAFB Edgelist

# Construct path to FAFB edgelist
edgelist_path <- construct_path(data_path, dataset, "edgelist_simple")

# Read edgelist (this may take a minute for large datasets)
cat("Loading FAFB edgelist (this may take 1-2 minutes)...\n")
## Loading FAFB edgelist (this may take 1-2 minutes)...
edgelist_full <- read_feather_smart(edgelist_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/fafb/fafb_783_simple_edgelist.feather 
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 288.61 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 15023799 rows
cat("\nEdgelist loaded:\n")
## 
## Edgelist loaded:
cat("  Total connections:", nrow(edgelist_full), "\n")
##   Total connections: 15023799
cat("  Columns:", paste(names(edgelist_full), collapse = ", "), "\n")
##   Columns: pre, post, count, norm, total_input
# Display structure
head(edgelist_full)
## # A tibble: 6 × 5
##   pre                post               count   norm total_input
##   <chr>              <chr>              <int>  <dbl>       <int>
## 1 720575940630598927 720575940596125868    13 0.0681         191
## 2 720575940638698842 720575940596125868    11 0.0576         191
## 3 720575940616595515 720575940596125868     9 0.0471         191
## 4 720575940625903678 720575940596125868     8 0.0419         191
## 5 720575940644945608 720575940596125868    12 0.0628         191
## 6 720575940621780812 720575940596125868     2 0.0105         191

Filter for Ca-α1T+ Neurons and Strong Partners

We’ll identify the strongest synaptic partners of Ca-α1T+ neurons using two criteria: 1. Normalised weight ≥ 90th percentile (strong relative connectivity) 2. Synapse count > 5 (minimum absolute connectivity)

if (length(ca_alpha1t_ids) > 0) {
  # Filter edgelist for connections involving Ca-α1T+ neurons
  # Include both pre and post connections
  edgelist_ca <- edgelist_full %>%
    filter(pre %in% ca_alpha1t_ids | post %in% ca_alpha1t_ids)

  cat("Connections involving Ca-α1T+ neurons:", nrow(edgelist_ca), "\n")

  # Calculate 90th percentile of normalised weight
  norm_threshold <- quantile(edgelist_ca$norm, 0.9, na.rm = TRUE)
  cat("90th percentile of norm:", norm_threshold, "\n")

  # Filter for strong connections (norm ≥ 90th percentile AND count > 5)
  edgelist_strong <- edgelist_ca %>%
    filter(norm >= norm_threshold, count > 5)

  cat("\nStrong connections (≥90th percentile norm AND count>5):", nrow(edgelist_strong), "\n")

  # Get all neuron IDs in the strong network (Ca-α1T+ neurons + partners)
  network_ids <- unique(c(edgelist_strong$pre, edgelist_strong$post))
  cat("Total neurons in network:", length(network_ids), "\n")

  # Get metadata for network neurons
  network_meta <- meta_full %>%
    filter(!!sym(dataset_id) %in% network_ids) %>%
    mutate(is_ca_alpha1t = !!sym(dataset_id) %in% ca_alpha1t_ids)

  cat("  Ca-α1T+ neurons:", sum(network_meta$is_ca_alpha1t), "\n")
  cat("  Partner neurons:", sum(!network_meta$is_ca_alpha1t), "\n")

} else {
  cat("No Ca-α1T+ neurons found - skipping connectivity analysis\n")
}
## Connections involving Ca-α1T+ neurons: 885607 
## 90th percentile of norm: 0.0411 
## 
## Strong connections (≥90th percentile norm AND count>5): 77905 
## Total neurons in network: 44236 
##   Ca-α1T+ neurons: 6775 
##   Partner neurons: 37456

Network Statistics

if (exists("edgelist_strong") && nrow(edgelist_strong) > 0) {
  # Compute network statistics
  cat("\nNetwork Statistics (Neuron-Level):\n")
  cat("="  , rep("=", 50), "\n", sep = "")

  # Connection counts
  cat("Total connections:", nrow(edgelist_strong), "\n")
  cat("Average synapse count:", mean(edgelist_strong$count), "\n")
  cat("Average normalised weight:", mean(edgelist_strong$norm), "\n")

  # Partner statistics
  ca_outputs <- edgelist_strong %>%
    filter(pre %in% ca_alpha1t_ids) %>%
    count(pre) %>%
    nrow()

  ca_inputs <- edgelist_strong %>%
    filter(post %in% ca_alpha1t_ids) %>%
    count(post) %>%
    nrow()

  cat("\nCa-α1T+ neurons with outputs:", ca_outputs, "\n")
  cat("Ca-α1T+ neurons with inputs:", ca_inputs, "\n")

  # Top partners
  cat("\nTop 10 strongest postsynaptic partners (neuron-level):\n")
  top_post <- edgelist_strong %>%
    filter(pre %in% ca_alpha1t_ids) %>%
    group_by(post) %>%
    summarise(
      total_count = sum(count),
      mean_norm = mean(norm),
      n_connections = n()
    ) %>%
    arrange(desc(mean_norm)) %>%
    head(10) %>%
    left_join(network_meta %>% select(!!sym(dataset_id), cell_type, super_class),
              by = setNames(dataset_id, "post"))

  print(top_post)

  cat("\nTop 10 strongest presynaptic partners (neuron-level):\n")
  top_pre <- edgelist_strong %>%
    filter(post %in% ca_alpha1t_ids) %>%
    group_by(pre) %>%
    summarise(
      total_count = sum(count),
      mean_norm = mean(norm),
      n_connections = n()
    ) %>%
    arrange(desc(mean_norm)) %>%
    head(10) %>%
    left_join(network_meta %>% select(!!sym(dataset_id), cell_type, super_class),
              by = setNames(dataset_id, "pre"))

  print(top_pre)

  # Cell type-level statistics
  cat("\n\n")
  cat("\nNetwork Statistics (Cell Type-Level):\n")
  cat("="  , rep("=", 50), "\n", sep = "")

  # Add cell type metadata to edgelist
  edgelist_with_types <- edgelist_strong %>%
    left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t),
              by = setNames(dataset_id, "pre")) %>%
    rename(pre_celltype = cell_type, pre_is_ca = is_ca_alpha1t) %>%
    left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t),
              by = setNames(dataset_id, "post")) %>%
    rename(post_celltype = cell_type, post_is_ca = is_ca_alpha1t)

  # Collapse to cell type level
  celltype_stats <- edgelist_with_types %>%
    group_by(pre_celltype, post_celltype, pre_is_ca, post_is_ca) %>%
    summarise(
      mean_norm = mean(norm, na.rm = TRUE),
      total_count = sum(count, na.rm = TRUE),
      n_neuron_connections = n(),
      .groups = "drop"
    ) %>%
    filter(!is.na(mean_norm))

  cat("Total cell type connections:", nrow(celltype_stats), "\n")
  cat("Average synapse count per cell type pair:", mean(celltype_stats$total_count), "\n")
  cat("Average normalised weight per cell type pair:", mean(celltype_stats$mean_norm), "\n")
  cat("Average neuron connections per cell type pair:", mean(celltype_stats$n_neuron_connections), "\n")

  # Cell type partner statistics
  ca_celltype_outputs <- celltype_stats %>%
    filter(pre_is_ca) %>%
    distinct(pre_celltype) %>%
    nrow()

  ca_celltype_inputs <- celltype_stats %>%
    filter(post_is_ca) %>%
    distinct(post_celltype) %>%
    nrow()

  cat("\nCa-α1T+ cell types with outputs:", ca_celltype_outputs, "\n")
  cat("Ca-α1T+ cell types with inputs:", ca_celltype_inputs, "\n")

  # Top cell type partners
  cat("\nTop 10 strongest postsynaptic partner cell types:\n")
  top_post_celltype <- celltype_stats %>%
    filter(pre_is_ca) %>%
    arrange(desc(mean_norm)) %>%
    head(10) %>%
    select(post_celltype, mean_norm, total_count, n_neuron_connections)

  print(top_post_celltype)

  cat("\nTop 10 strongest presynaptic partner cell types:\n")
  top_pre_celltype <- celltype_stats %>%
    filter(post_is_ca) %>%
    arrange(desc(mean_norm)) %>%
    head(10) %>%
    select(pre_celltype, mean_norm, total_count, n_neuron_connections)

  print(top_pre_celltype)
}
## 
## Network Statistics (Neuron-Level):
## ===================================================
## Total connections: 77905 
## Average synapse count: 22.37877 
## Average normalised weight: 0.1097034 
## 
## Ca-α1T+ neurons with outputs: 5322 
## Ca-α1T+ neurons with inputs: 6627 
## 
## Top 10 strongest postsynaptic partners (neuron-level):
## # A tibble: 10 × 6
##    post               total_count mean_norm n_connections cell_type super_class
##    <chr>                    <int>     <dbl>         <int> <chr>     <chr>      
##  1 720575940608048818           9         1             1 R7        sensory    
##  2 720575940608368050           9         1             1 R7        sensory    
##  3 720575940613188778          19         1             1 R7        sensory    
##  4 720575940613194226          23         1             1 R8        sensory    
##  5 720575940614720040           6         1             1 R8        sensory    
##  6 720575940615021352          14         1             1 R8        sensory    
##  7 720575940615969898          10         1             1 R8        sensory    
##  8 720575940616743436          34         1             1 R7        sensory    
##  9 720575940620682320          24         1             1 R7        sensory    
## 10 720575940621179329          19         1             1 R7        sensory    
## 
## Top 10 strongest presynaptic partners (neuron-level):
## # A tibble: 10 × 6
##    pre                total_count mean_norm n_connections cell_type super_class 
##    <chr>                    <int>     <dbl>         <int> <chr>     <chr>       
##  1 720575940636953445           9     0.6               1 L1        optic_lobe_…
##  2 720575940625365577         130     0.592             2 L1        optic_lobe_…
##  3 720575940612744085          36     0.581             1 L2        optic_lobe_…
##  4 720575940624865463          29     0.547             1 L1        optic_lobe_…
##  5 720575940615608482          85     0.532             2 L1        optic_lobe_…
##  6 720575940605418668         287     0.528             1 L2        optic_lobe_…
##  7 720575940626763210         185     0.520             1 L2        optic_lobe_…
##  8 720575940617341652          97     0.519             1 L2        optic_lobe_…
##  9 720575940623312391         177     0.509             1 L2        optic_lobe_…
## 10 720575940638509631          54     0.505             1 L2        optic_lobe_…
## 
## 
## 
## Network Statistics (Cell Type-Level):
## ===================================================
## Total cell type connections: 395 
## Average synapse count per cell type pair: 4413.716 
## Average normalised weight per cell type pair: 0.08035699 
## Average neuron connections per cell type pair: 197.2278 
## 
## Ca-α1T+ cell types with outputs: 7 
## Ca-α1T+ cell types with inputs: 7 
## 
## Top 10 strongest postsynaptic partner cell types:
## # A tibble: 10 × 4
##    post_celltype mean_norm total_count n_neuron_connections
##    <chr>             <dbl>       <int>                <int>
##  1 R7                0.670       17354                  784
##  2 R8                0.638       14563                  759
##  3 L5                0.615          32                    1
##  4 L3                0.496          54                    5
##  5 R1-6              0.479          23                    3
##  6 L4                0.458          22                    1
##  7 <NA>              0.456          20                    2
##  8 R7                0.414          22                    3
##  9 R8                0.339          26                    3
## 10 R8                0.225          97                   14
## 
## Top 10 strongest presynaptic partner cell types:
## # A tibble: 10 × 4
##    pre_celltype mean_norm total_count n_neuron_connections
##    <chr>            <dbl>       <int>                <int>
##  1 L2               0.334      195640                 1615
##  2 L1               0.299       49716                 1404
##  3 L1               0.271      175754                 1586
##  4 <NA>             0.269          85                    3
##  5 <NA>             0.245         334                    3
##  6 R1-6             0.170           8                    1
##  7 Tm2              0.143          13                    1
##  8 L2               0.136         636                   16
##  9 T4a              0.136          15                    2
## 10 T1               0.133           8                    1

Network Visualisation

Create Network Graph

if (exists("edgelist_strong") && nrow(edgelist_strong) > 0) {
  # Filter for only UPSTREAM connections to Ca-α1T+ neurons
  # (i.e., post must be Ca-α1T+)
  edgelist_upstream <- edgelist_strong %>%
    filter(post %in% ca_alpha1t_ids)

  cat("Filtering for upstream connections to Ca-α1T+ neurons:\n")
  cat("  Total strong connections:", nrow(edgelist_strong), "\n")
  cat("  Upstream to Ca-α1T+:", nrow(edgelist_upstream), "\n")

  # Add cell type and neurotransmitter metadata to edgelist
  edgelist_with_types <- edgelist_upstream %>%
    left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t, neurotransmitter_predicted),
              by = setNames(dataset_id, "pre")) %>%
    rename(pre_celltype = cell_type, pre_is_ca = is_ca_alpha1t, pre_nt = neurotransmitter_predicted) %>%
    left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t),
              by = setNames(dataset_id, "post")) %>%
    rename(post_celltype = cell_type, post_is_ca = is_ca_alpha1t)

  # Filter for only presynaptic neurons with GABA or glutamate
  edgelist_inhibitory <- edgelist_with_types %>%
    filter(tolower(pre_nt) %in% c("gaba", "glutamate"))

  cat("  After filtering for GABA/glutamate presynaptic:", nrow(edgelist_inhibitory), "\n")

  # Count neurons per cell type (both pre and post)
  neurons_per_celltype <- network_meta %>%
    group_by(cell_type) %>%
    summarise(n_neurons = n(), .groups = "drop")

  # Replace NA cell types with "unknown"
  edgelist_inhibitory <- edgelist_inhibitory %>%
    mutate(
      pre_celltype = ifelse(is.na(pre_celltype), "unknown", pre_celltype),
      post_celltype = ifelse(is.na(post_celltype), "unknown", post_celltype),
      pre_is_ca = ifelse(is.na(pre_is_ca), FALSE, pre_is_ca),
      post_is_ca = ifelse(is.na(post_is_ca), FALSE, post_is_ca),
      pre_nt = ifelse(is.na(pre_nt), "unknown", pre_nt)
    )

  # Collapse to cell type level
  # Calculate mean norm and total count for each cell type pair
  celltype_edgelist <- edgelist_inhibitory %>%
    group_by(pre_celltype, post_celltype, pre_is_ca, post_is_ca) %>%
    summarise(
      mean_norm = mean(norm, na.rm = TRUE),
      total_count = sum(count, na.rm = TRUE),
      n_neuron_connections = n(),
      .groups = "drop"
    ) %>%
    filter(!is.na(mean_norm))  # Remove any NA weights

  cat("\nCell type-level network:\n")
  cat("  Collapsed to cell type-cell type connections:", nrow(celltype_edgelist), "\n")

  # Create cell type vertex data
  # Get unique cell types from both pre and post
  all_celltypes <- unique(c(celltype_edgelist$pre_celltype, celltype_edgelist$post_celltype))

  # Determine if each cell type contains Ca-α1T+ neurons and count neurons
  celltype_vertices <- data.frame(
    cell_type = all_celltypes,
    stringsAsFactors = FALSE
  ) %>%
    left_join(
      edgelist_inhibitory %>%
        group_by(post_celltype) %>%
        summarise(has_ca_alpha1t = any(post_is_ca), .groups = "drop"),
      by = c("cell_type" = "post_celltype")
    ) %>%
    left_join(neurons_per_celltype, by = "cell_type") %>%
    mutate(
      has_ca_alpha1t = ifelse(is.na(has_ca_alpha1t), FALSE, has_ca_alpha1t),
      n_neurons = ifelse(is.na(n_neurons), 1, n_neurons)  # Default to 1 if unknown
    )

  # Create igraph object at cell type level
  g_celltype <- graph_from_data_frame(
    d = celltype_edgelist %>% select(pre_celltype, post_celltype, mean_norm, total_count, n_neuron_connections),
    directed = TRUE,
    vertices = celltype_vertices
  )

  cat("\nCell type network graph created:\n")
  cat("  Cell types (vertices):", vcount(g_celltype), "\n")
  cat("  Cell type connections (edges):", ecount(g_celltype), "\n")
  cat("  Upstream inhibitory cell types:", sum(!V(g_celltype)$has_ca_alpha1t), "\n")
  cat("  Ca-α1T+ target cell types:", sum(V(g_celltype)$has_ca_alpha1t), "\n")

  # Calculate node degrees
  V(g_celltype)$in_degree <- degree(g_celltype, mode = "in")
  V(g_celltype)$out_degree <- degree(g_celltype, mode = "out")
  V(g_celltype)$total_degree <- degree(g_celltype, mode = "all")

  cat("\nDegree statistics:\n")
  cat("  Mean in-degree:", round(mean(V(g_celltype)$in_degree), 2), "\n")
  cat("  Mean out-degree:", round(mean(V(g_celltype)$out_degree), 2), "\n")

  # Store for plotting
  g <- g_celltype

} else {
  cat("No network data available for visualisation\n")
}
## Filtering for upstream connections to Ca-α1T+ neurons:
##   Total strong connections: 77905 
##   Upstream to Ca-α1T+: 24140 
##   After filtering for GABA/glutamate presynaptic: 9110 
## 
## Cell type-level network:
##   Collapsed to cell type-cell type connections: 120 
## 
## Cell type network graph created:
##   Cell types (vertices): 69 
##   Cell type connections (edges): 120 
##   Upstream inhibitory cell types: 62 
##   Ca-α1T+ target cell types: 7 
## 
## Degree statistics:
##   Mean in-degree: 1.74 
##   Mean out-degree: 1.74

Plot Network

if (exists("g") && vcount(g) > 0) {
  # Set node colors based on Ca-α1T expression
  V(g)$color <- ifelse(V(g)$has_ca_alpha1t, "#E41A1C", "#377EB8")
  V(g)$color_label <- ifelse(V(g)$has_ca_alpha1t, "Ca-α1T+ cell type", "GABA/Glu input")

  # Set node sizes based on NUMBER OF NEURONS in the cell type
  # Use sqrt to scale area proportionally to neuron count
  V(g)$size <- sqrt(V(g)$n_neurons) * 2

  # Set edge widths based on LOG of mean normalised weight
  # Add small constant to avoid log(0)
  E(g)$width <- log10(E(g)$mean_norm + 0.001) - min(log10(E(g)$mean_norm + 0.001))
  E(g)$width <- (E(g)$width / max(E(g)$width)) * 8 + 1  # Scale to 1-9 range

  # Set edge colors based on mean normalised weight using viridis
  edge_colors <- viridis(100)
  norm_scaled <- cut(E(g)$mean_norm, breaks = 100, labels = FALSE)
  E(g)$color <- edge_colors[norm_scaled]

  # Use Fruchterman-Reingold layout for cell type networks
  set.seed(42)
  layout_fr <- layout_with_fr(g)

  # Plot with base R graphics
  par(mar = c(2, 2, 3, 2), bg = "white")
  plot(g,
       layout = layout_fr,
       vertex.color = V(g)$color,
       vertex.size = 3,#V(g)$size,
       vertex.label = V(g)$name,  # Show all cell type names
       vertex.label.cex = 0.7,
       vertex.label.color = "black",
       vertex.label.family = "sans",
       vertex.frame.color = "white",
       edge.arrow.size = 0.5,
       edge.arrow.width = 1,
       edge.width = E(g)$width,
       edge.color = E(g)$color,
       edge.curved = 0.2,
       main = "Inhibitory/Glutamatergic Inputs to Ca-α1T+ Cell Types (FAFB)",
       sub = paste("Cell type network:", vcount(g), "cell types,", ecount(g),
                   "connections | Node size = # neurons | Edge width = log(mean norm)")
  )

  # Add legend
  legend("topleft",
         legend = c("Ca-α1T+ cell type", "GABA/Glu input",
                    "Node size = # neurons", "Edge width = log(mean norm)"),
         col = c("#E41A1C", "#377EB8", NA, NA),
         pch = c(19, 19, NA, NA),
         pt.cex = 2,
         bty = "n",
         cex = 1.1)

  # Save as PNG
  png(file.path(img_dir, "ca_alpha1t_inhibitory_network.png"),
      width = 14, height = 14, units = "in", res = 300)
  par(mar = c(2, 2, 3, 2), bg = "white")
  plot(g,
       layout = layout_fr,
       vertex.color = V(g)$color,
       vertex.size = 3,#V(g)$size,
       vertex.label = V(g)$name,  # Show all cell type names
       vertex.label.cex = 0.7,
       vertex.label.color = "black",
       vertex.label.family = "sans",
       vertex.frame.color = "white",
       edge.arrow.size = 0.5,
       edge.arrow.width = 1,
       edge.width = E(g)$width,
       edge.color = E(g)$color,
       edge.curved = 0.2,
       main = "Inhibitory/Glutamatergic Inputs to Ca-α1T+ Cell Types (FAFB)",
       sub = paste("Cell type network:", vcount(g), "cell types,", ecount(g),
                   "connections | Node size = # neurons | Edge width = log(mean norm)")
  )
  legend("topleft",
         legend = c("Ca-α1T+ cell type", "GABA/Glu input",
                    "Node size = # neurons", "Edge width = log(mean norm)"),
         col = c("#E41A1C", "#377EB8", NA, NA),
         pch = c(19, 19, NA, NA),
         pt.cex = 2,
         bty = "n",
         cex = 1.1)
  dev.off()

  cat("\nInhibitory network plot saved to:", file.path(img_dir, "ca_alpha1t_inhibitory_network.png"), "\n")
}

## 
## Inhibitory network plot saved to: images/tutorial_05/ca_alpha1t_inhibitory_network.png

Cell Type Composition

if (exists("network_meta") && nrow(network_meta) > 0) {
  # Count cell types in network, separated by Ca-α1T expression
  celltype_counts <- network_meta %>%
    filter(!is.na(cell_type)) %>%
    count(cell_type, is_ca_alpha1t) %>%
    arrange(desc(n))

  # Plot cell type composition
  p_celltypes <- ggplot(celltype_counts,
                        aes(x = reorder(cell_type, n), y = n, fill = is_ca_alpha1t)) +
    geom_col() +
    scale_fill_manual(
      values = c("FALSE" = "#377EB8", "TRUE" = "#E41A1C"),
      labels = c("FALSE" = "Partner", "TRUE" = "Ca-α1T+"),
      name = "Neuron Type"
    ) +
    coord_flip() +
    labs(
      title = "Cell Types in Ca-α1T+ Network",
      subtitle = paste("Cell types of", sum(celltype_counts$n), "neurons in the strong connectivity network"),
      x = "Cell Type",
      y = "Number of Neurons"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14),
      axis.text = element_text(size = 10),
      legend.position = "top"
    )

  save_plot(p_celltypes, "ca_alpha1t_network_celltypes", width = 10, height = 8)
  ggplotly(p_celltypes)
}

Summary

In this tutorial, we:

  1. ✓ Loaded transcriptomics data for visual system cell types
  2. ✓ Identified cell types with high Ca-α1T expression (≥90th percentile)
  3. ✓ Mapped transcriptomic cell types to FAFB connectome metadata
  4. ✓ Loaded FAFB edgelist and filtered for strong connections
  5. ✓ Created a network graph of Ca-α1T+ neurons and their strongest synaptic partners
  6. ✓ Visualised the network structure and cell type composition

Key Findings

if (exists("network_meta") && exists("edgelist_strong")) {
  cat("\n=== SUMMARY STATISTICS ===\n\n")

  cat("Transcriptomics:\n")
  cat("  Cell types with high Ca-α1T (≥90th percentile):", length(high_expressors), "\n")

  cat("\nConnectome mapping:\n")
  cat("  Ca-α1T+ neurons found in FAFB:", length(ca_alpha1t_ids), "\n")

  cat("\nConnectivity network:\n")
  cat("  Total neurons in network:", vcount(g), "\n")
  cat("  Total strong connections:", ecount(g), "\n")
  cat("  Network density:", round(edge_density(g), 4), "\n")

  cat("\nBiological interpretation:\n")
  cat("  Ca-α1T is expressed in specific visual system cell types\n")
  cat("  These neurons form a highly connected subnetwork\n")
  cat("  T-type calcium channels likely enable graded signaling in these circuits\n")
}
## 
## === SUMMARY STATISTICS ===
## 
## Transcriptomics:
##   Cell types with high Ca-α1T (≥90th percentile): 7 
## 
## Connectome mapping:
##   Ca-α1T+ neurons found in FAFB: 6815 
## 
## Connectivity network:
##   Total neurons in network: 69 
##   Total strong connections: 120 
##   Network density: 0.0256 
## 
## Biological interpretation:
##   Ca-α1T is expressed in specific visual system cell types
##   These neurons form a highly connected subnetwork
##   T-type calcium channels likely enable graded signaling in these circuits

Next Steps

Possible extensions of this analysis: - Compare Ca-α1T+ networks across different datasets (BANC, maleCNS) - Analyse other ion channel genes (e.g., Ca-α1D, other VGCCs) - Investigate functional implications using neurotransmitter predictions - Examine morphological properties of Ca-α1T+ neurons - Compare expression patterns across developmental stages

References

  • Transcriptomics data: Özel et al. (2023) bioRxiv. Link
  • Ca-α1T gene: FlyBase FBgn0264386
  • FAFB connectome: Dorkenwald et al. (2024) Nature; Schlegel et al. (2024) Nature

Tutorial complete! 🎉